SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 126(7), 781-797 (JULY 1996) 


Splaysort: Fast, Versatile, Practical* 


ALISTAIR MOFFAT 
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia 


GARY EDDY 
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia 


AND 


OLA PETERSSON 


Department of Computer Science, Lund University, Box 118, S-22100 Lund, Sweden; and Department of 
Mathematics, Statistics, and Computer Science, Växjö University, S-35195 Växjö, Sweden 


SUMMARY 


Adaptivity in sorting algorithms is sometimes gained at the expense of practicality. We give experimental 
results showing that Splaysort - sorting by repeated insertion into a Splay tree — is a surprisingly efficient 
method for in-memory sorting. Splaysort appears to be adaptive with respect to all accepted measures 
of presortedness, and it outperforms Quicksort for sequences with modest amounts of existing order. 
Although Splaysort has a linear space overhead, there are many applications for which this is reasonable. 
In these situations Splaysort is an attractive alternative to traditional comparison-based sorting algorithms 
such as Heapsort, Mergesort, and Quicksort. 


KEY WORDS: adaptive sorting; splay tree; splaytree; quicksort; natural merge sort 


INTRODUCTION 


An adaptive algorithm is one which requires fewer resources to solve easy problem instances 
than it does to solve hard. For sorting, an adaptive algorithm should run in O(n) time if 
presented with a sorted n-sequence, and in O(n logn) time for all n-sequences, with the 
time for any particular sequence depending upon the amount of pre-existing order. Mannila! 
established the notion of a measure of presortedness to quantify the disorder in any input 
sequence, and introduced the concept of optimal adaptivity. 

Research into adaptive sorting algorithms (see Moffat and Petersson” for an overview) 
has tended to emphasise adaptivity in this asymptotic sense, and, in the search for improved 
algorithms, practicality has sometimes been overlooked. For example, adaptive algorithms 
have often relied upon complex data structures that incur high computational overheads, 
such as finger search trees. Given that part of the motivation for research into adaptive 
algorithms is the assertion that many of the sequences sorted in practice exhibit some 
degree of presortedness, it is interesting to investigate their performance empirically. No 


* A preliminary presentation of this work was made at the 16’th Australian Computer Science Conference, Brisbane, February 
1993. 


CCC 0038-0644/96/070781-17 Received November 1993 
©1996 by John Wiley & Sons, Ltd. Revised November 1994 and November 1995 


782 A. MOFFAT, G. EDDY AND O. PETERSSON 


matter how elegant the asymptotic behaviour of an algorithm, unless it can be implemented 
to require modest amounts of time and space, it will not be of interest to practitioners. 

In this paper we consider the adaptivity of Splaysort, the sorting algorithm that results 
when the insertion sort paradigm and the Splay tree dictionary data structure’ are combined. 
Single operations on Splay trees are only efficient in an amortised sense, but in trading away 
worst-case efficiency, a substantially simpler implementation is achieved. Splay trees are 
also self-adjusting, and exploit both temporal and spatial locality in sequences of operations. 

It is thus natural to ask if Splaysort is adaptive in any formal sense, and whether it 
is competitive with other in-memory sorting techniques for practical sized problems and 
reasonable models of presortedness. These are the questions addressed in this paper. 

We do not have an answer for the first question. Experimentally, Splaysort gives every 
indication of being adaptive to all currently accepted measures of presortedness, but only 
for a few very restricted situations has a formal analysis been derived. We can, however, 
answer the second question. When applied to random data Splaysort is up to 50-60 per 
cent slower than the fastest available Quicksort implementation, but it becomes superior 
when a relatively small amount of presortedness is introduced. This combination means that 
Splaysort can be used safely in many situations. If there is some presortedness present, it is 
exploited and execution time saved; and if there is not, the time required is only a constant 
factor greater than for Quicksort. 

The main drawback of Splaysort is that it requires O(n) extra storage space to sort a list 
of n items, and thus, if memory is at a premium, the O(log n) extra space Quicksort or the 
O(1) extra space Heapsort must be preferred.* However, there are applications in which 
the use of extra memory is essential. One such situation is when long records are being 
sorted. It is much more economical to sort an array of pointers and then move each record 
once than it is to move each record multiple times during the course of an in-situ sorting 
process. 


SPLAYSORT 


The Splay tree dictionary data structure was developed by Sleator and Tarjan in the early 
1980s.7 Compared to other tree structures, the novelty of Splay trees is that they are efficient 
in an amortised sense rather than in a worst-case sense. Any particular operation might be 
expensive, but over a sequence of operations it is possible to bound the total running time 
more tightly than by summing the worst-case cost of the individual operations. 

A Splay tree is similar to a binary search tree,* but with one crucial difference. The 
difference is that after each operation on the tree, a sequence of edge rotations is performed 
that brings the accessed item to the root of the tree while still retaining the ordering necessary 
for searching. The rules governing the sequence of rotations — the splaying of the tree — 
are relatively simple, condensing to two normal cases and one terminating case. Moreover, 
the program code to perform splaying is significantly more compact than the equivalent 
routines for manipulating balanced trees such as AVL trees.’ 

Splaying the tree after every insertion and after every access is not enough to guarantee 
anything other than an O(n) worst-case bound on the cost of individual operations, since 
there is no explicit balance rule such as there is for AVL trees. However, Sleator and Tarjan 
were able to show with their Balance Theorem that over a sequence of m operations on 
a Splay tree of n items the total time required is O((n + m)logn) in the worst case. If 
m = Q(n), a Splay tree is thus no worse asymptotically than the various types of explicitly 
balanced search trees. 


SPLAYSORT 783 


Sleator and Tarjan also showed that Splay trees are asymptotically no worse than op- 
timal binary search trees,‘ and that they achieve this without requiring pre-knowledge of 
the distribution of symbols. They are also efficient for access sequences involving either 
temporal or spatial locality, Bell and Guptař have compared Splay trees and AVL trees 
in an experimental setting and found that on very skew access patterns Splay trees are 
indeed faster than AVL trees, but that for most sequences the AVL tree should be preferred. 
However, their experiments were concerned more with access operations than insertions 
and so favoured the AVL tree, since there is no re-balancing of an AVL tree during access 
operations, only during insertions. 

Many adaptive sorting algorithms have resulted from the use of the insertion sort paradigm 
and a specialised dictionary data structure. For example, both the A-sort of Mehthorn® 
and the Local Insertion Sort of Mannila! use finger search trees, exploiting the spatial 
locality that is present in some classes of nearly sorted sequences. Indeed, the simplest 
of all insertion sorts, Linear Insertion Sort, makes use of a sorted array as a dictionary 
data structure. Splaysort is the sorting algorithm that results when the items in the input 
sequence are inserted one by one into an initially empty Splay tree and then recovered in 
sorted order by performing an inorder traversal of the final tree. The results of Sleator and 
Tarjan? are sufficient to show that Splaysort is worst-case optimal, expending O(n logn) 
comparisons when sorting an input sequence of n items. 


Figure 1. Insertion of items in sorted order into a Splay tree: (a) before insertion of 23; (b) after insertion, 
before splaying; and (c) after splaying 


It is also not difficult to see that if Splaysort is presented with an initially sorted or 
reverse sorted sequence then the running time is O(n), and that it is, in this minimal sense, 
an adaptive algorithm. Figure 1 shows such a situation. Despite the ‘bad’ shape of the 
tree, each insertion takes O(1) time. Hence, in a rudimentary sense, Splaysort is certainly 
adaptive. 

Cole et al.’* noted this adaptivity, and conjectured that Splaysort is optimally adaptive 
with respect to both /nv, the number of inversions, and the superior measure Loc (definitions 
of these measures appear below). To date, however, only relatively weak results concerning 
Splaysort have been proved, since the analysis techniques established and used by Sleator 
and Tarjan appear to be incapable of yielding bounds of less than n log n. A tighter analysis 


784 A. MOFFAT, G. EDDY AND O. PETERSSON 


of Splaysort is perhaps the major current challenge in the area of adaptive sorting. 

To gauge the adaptivity of Splaysort we counted the number of comparisons required by 
Splaysort on a variety of random nearly sorted inputs. Surprisingly, no category of nearly 
sorted sequence could be found for which Splaysort behaved suboptimally. 

Also unexpected was that Splaysort, even on random data, is fast enough to be considered 
as an alternative to the classical sorting methods — Quicksort, Heapsort, and Mergesort — 
and on some simple categories of nearly sorted lists is clearly superior. It is this practical 
usage of Splaysort that we seek to justify with the discussion and results reported here. The 
following sections describe the implementation of Splaysort that was tested, the experiments 
that were performed and the results that were achieved. 


IMPLEMENTATION 


A variety of sorting algorithms were implemented and tested on both random and nearly 
sorted data. All of the programs were written in C and compiled using gcc -02 on a Sun 
SPARCstation IPC, and supported an interface identical to the C library function qsort 


sort (void *A, size_t n, size_t length, int (*cmp)(const void *, const void*)) 


where A points to the first byte of an array holding the data to be sorted; n is the number 
of elements in A; length is the size in bytes of each element; and cmp is a comparison 
function which returns less than zero, zero, or greater than zero, depending on whether its 
first argument is less than, equal to, or greater than its second. All values reported are the 
average of 100 runs at each data point. 

After preliminary experiments, we concentrated on top-down splaying (Sleator and Tarjan, 
Reference 3, page 669), since this was faster than the bottom-up alternative. Two variants 
making use of top-down splaying were implemented. The first was more space economical, 
requiring just 8n bytes of additional storage to sort n records — 4 bytes per item for each 
of a ‘left’ and a ‘right’ pointer, with an implicit correspondence between the records to be 
sorted (held in the array passed as the parameter A) and the tree pointers corresponding to 
those records. 

In the first phase of the sort each record is inserted into the Splay tree using the two arrays 
of pointers. Then, at the completion of the insertion phase, a destructive iterative inorder 
traversal (see, for example, Knuth, Reference 9, Exercise 21, page 330, and answer on 
page 562) is performed, calculating a permutation or destination vector. Finally, in a further 
linear pass, the data is permuted in-situ in the original array (see Sedgewick, Reference 10, 
page 106). 

The implicit correspondence between data records and tree nodes in this 8-byte version 
necessitated a multiplication operation in conjunction with each comparison. These calcu- 
lations proved costly. Indeed, the first Splaysort implementation ran so slowly that there 
seemed no prospect at all of the method being viable, because the integer multiply and 
divide operations on the RISC workstation being used at that time were implemented in 
software rather than hardware. Performing explicit conversions to floating point values to 
make use of the hardware floating point support and then converting back to integers and 
then pointers to access the array, made the program run more than 50 per cent faster. It 
was, however, still significantly slower than the library qsort being used as a reference 
point. Most implementations of Quicksort avoid the multiplication problem by processing 
the data array sequentially during each partitioning step, so that addresses can be calculated 
using additive operations. 


SPLAYSORT 785 


To sidestep the calculation completely, a second variant of top-down splaying was also 
implemented. In this version each node has an extra pointer field to directly store the address 
of the corresponding record. This reduced the running time by a further 25 per cent. Both 
of these two versions — the 8-byte and the 12-byte Splaysorts — require about 200 lines 
of source code. Figure 2 illustrates the use of the three arrays of the 12-byte version. The 
entries in the ‘data pointer’ array are fixed, and always point to the same record in the 
source array A. 


125364 


Ae He a ee ee 


data pointer 


Figure 2. Sorting with Splaysort: (a) the input sequence; (b) the resultant Splay tree after all items have been 
inserted; and (c) the internal data structure representing the Splay tree 


Quicksort was used as a reference point. The library qgsort implementation (SunOS 4.1) 
was used at first, and, much to our surprise, the 12-byte Splaysort was faster, even when 
sorting an array of random integers. In an attempt to find out why, a second implementation 
of Quicksort was coded, following the advice given by Sedgewick'! (see also Sedgewick, 
Reference 10, Chapter 9) and later by Bentley.” This was about 20 per cent faster than the 
system qsort. We were then offered access to a third Quicksort implementation, developed 
by Bentley and McIlroy. This variant ran a further 20 per cent faster than our own 
best effort and close to 40 per cent faster than the original system qsort. It was used 
as a benchmark for speed in all of the subsequent experimentation, and all references to 
Quicksort below are to this carefully engineered implementation.'? Indeed, examination of 
the Bentley—Mcllroy code was sufficiently instructive that several of their techniques were 
incorporated in our own functions. 

A pointer-based Quicksort was also coded as a second reference point. This version 
assigns pointers rather than using a byte-by-byte data copy when swapping items, and then, 
in the same manner as the Splaysort, performs an in-place permutation, moving each record 
just once. Again, the strategies articulated by Bentley and McIlroy were used as a basis 


786 A. MOFFAT, G. EDDY AND O. PETERSSON 


for the program. On n records each of b bytes the conventional in-situ Quicksort requires 
O(bn logn) time, while pointer-based methods (including Splaysort) take O(n log n + bn) 
time, assuming that the comparison of two key fields can be performed in O(1) expected 
time. The space and time behaviour of the methods tested is summarised in Table I. 


Table I. Expected space (in bytes) and time bounds for Quicksort, Splaysort and Natural Mergesort on random 
sequences, assuming n records each of b bytes, and O(1) average time per comparison 


Sorting algorithm Space Time 
Quicksort (in-place) O(log n) O(bn logn) 
Quicksort (pointer) 4n+O(logn) O(nlogn + bn) 


Splaysort (implicit pointer) 8n + O(1) O(nlogn + bn) 
Splaysort (explicit pointer) 12n+O(1)  O(nlogn + bn) 
Natural Mergesort 8n + O(1) O(n log n + bn) 


Also listed in Table I is Natural Mergesort.* In this method the input list is scanned in 
a preliminary pass to identify all of the ascending sequences or runs, and then these are 
merged in pairs until a single sorted sequence remains. When the input list is already sorted 
Natural Mergesort takes O(n) time, and, more generally, when the list is the concatenation 
of k sorted sequences, Natural Mergesort sorts it in O(n log k) time. Our implementation 
of Natural Mergesort requires four bytes per item for each of two pointer arrays, hence 
the 8n byte average overhead. Several other algorithms were tested, including CK-sort;!* 
Heapsort and Mergesort;* Adaptive Heapsort; Local Insertion Sort;! A-sort;* Melsort;'® 
and Bst-sort (repeated insertion into a binary search tree). It was anticipated that of all of 
the methods tested the Natural Mergesort would be the fastest competitor to the Quicksorts, 
since it expends fewer comparisons than the other methods and requires relatively simple 
code and tight loops, both of which are important on typical architectures. This expectation 
was confirmed by the experiments described in the following. 

The behaviour of adaptive sorting algorithms has also been investigated recently by P. 
Mcllroy.!? His MSES/MSLS mergesort implementation is about 15 per cent faster than our 
Natural Mergesort, and is as fast as the new qsort when sorting strings.'* 7 It requires 
close to the optimal number of comparisons on several broad classes of nearly sorted inputs, 
and so is efficient when each comparison is costly. However the running time of this method 
can be O(nlogn) both on random lists and many categories of nearly sorted lists, since 
the amount of data movement required can be superlinear in the number of comparisons. 
We did not include this method in our experiments. Nevertheless, it is very fast on lists 
containing short records where there is some amount of presortedness and should also be 
considered. Moffat, Petersson and Wormald independently described a similar method.!® 
Their tree-based Mergesort requires time linear in the number of comparisons, and so is 
optimally adaptive to a range of measures of presortedness. 


PRESORTEDNESS 


Before describing the mechanism used to generate random sequences, we first review meth- 
ods for quantifying presortedness. Our descriptions follow those of Mannila,' and Petersson 
and Moffat.!? In all cases we assume that some n-sequence X = (x1, 22,...,Zn) is to be 
sorted. Note that low values of the measures correspond to nearly sorted sequences and 


SPLAYSORT 787 


high values correspond to highly disordered (according to that measure) sequences. For 
most measures, ‘sorted’ into non-decreasing order has measure of either zero or one. 

Perhaps the most intuitively appealing and accepted measures of presortedness are the 
number of ascending runs in the input, 


Runs(X) = |{i: 1 <i < n and z; > ziyi} +1 
and the number of pairwise inversions, 
In (X) = |{(i,j): 1<i<j<nand z; > z;}| 


Less obvious, but also appealing are Rem, the number of items that must be removed to 
leave a sorted sequence, 


Rem(X) = n — max{k : X has an ascending subsequence of length k} 


and SMS, the least value k such that the input list can be formed by interleaving k monotone 
sequences, 


SMS (X) = min{k : X can be composed as a shuffle of k monotone sequences} 


A number of other measures have also been described, and some of these are discussed in 
the following. 

Mannila considered the problem of how well an algorithm might adapt to any measure, 
and proposed the following definitions." '° Let Sn be the set of permutations of 1...n, and 
Thn the set of binary decision trees for sorting n items. Define 


Cm(n, k) = min max{the depth of 7 in T : m € S, and M(m) < k} 


Then an algorithm A that takes time T4 (X) to sort X is optimally adaptive with respect to 
M, or M-optimal, if 
Ta(X) = O(Cu (|X|, M(X))) 


Note that this definition captures optimality in an asymptotic sense, and does not force any 
algorithm to employ the exact number of comparisons stated by an information-theoretic 
lower bound. 


Table II. Measures of presortedness 


Measure Minimum Maximum Cu(n,k) References 
Runs 1 n O(n log k) 1,4,20 

Inv 0 n(n — 1)/2 O(nlog(k/n)) 1,6 

Rem 0 n-1 O(n+klogk) 1,14 


SMS 1 L2nt (1/4) - (1/2)} O(n log k) 21,22 


Values of Cyy(n,k) for the four measures described so far are shown in Table II. These 
bounds reflect the best that any algorithm can hope to do. For example, for every sorting 
algorithm there must exist at least one n-sequence of k or fewer runs for which that 
algorithm requires Q(n log k) time. An algorithm might do better on some sequences of k 


788 A. MOFFAT, G. EDDY AND O. PETERSSON 


runs that perhaps have some additional structure, but cannot do better on every sequence of 
k runs. Natural Merge Sort, which sorts sequences with k runs in O(n log k) time, is one 
algorithm that is optimally adaptive with respect to Runs. 

We now describe the generator programs used to produce ‘random nearly sorted’ test data. 
To generate random n-sequences with k or fewer runs a random permutation of 1...n was 
divided into k parts by randomly choosing k — 1 positions, and then each part was sorted. 
This was intended to model the situation when several sorted files are concatenated, and the 
result re-sorted. The authors have certainly been guilty of this laziness, using (on a Unix 
system) ‘cat * | sort’ when ‘sort -m *’ might have been used. In fact, if a Runs- 
optimal sorting algorithm is used and there are k files to be merged, the time required is 
O(n log k) anyway. 

The second generator program produced sequences with kn or fewer inversions, that is, 
an average of k or less per item. Starting with a sorted sequence, the first y/n items were 
randomly permuted, contributing at most n/2 inversions to the total. The remaining items 
were broken into blocks of k and each block permuted, so that each of these n — y/n items 
had fewer than k — 1 inversions and /nv (X), the total, was less than kn. This corresponds 
to a situation where groups of records are appended to a file, with each group containing 
items larger than any already in the file, but with no ordering within each group. 

To generate Rem random sequences, k randomly chosen items were extracted from the 
list 1... n and reinserted in random positions. This models the presortedness that arises in 
a situation where, over a period of time, records in a sorted file are edited, their key values 
changed, and the file eventually re-sorted. All Rem-adaptive algorithms should provide fast 
sorting on these sequences. 

The final generator produced sequences with a bounded value for SMS, to model an 
application in which sorted and reverse sorted sequences from several asynchronous sources 
are interleaved and the result re-sorted. To this end the list 1... was randomly split into k 
parts, and then the k sequences that resulted were randomly interleaved in either ascending 
or descending order without disturbing the relativities within each sequence. 

To assist the visualisation of these sequences, Figure 3 shows, for each of these four 
generation methods, a sequence that was generated with n = 100 and k = 4. In each graph 
the final position of an item is plotted as a function of its position in the input list. 


Figure 3. Example nearly sorted sequences with n = 100, X [i] as a function of i, where (a) Runs(X) < 4; (b) 
Inv(X) < 400; (c) Rem(X) < 4; and (d) SMS(X) < 4 


SPLAYSORT 789 


RANDOM SEQUENCES 


In the first set of experiments the speed of Splaysort and a variety of other sorting methods 
was measured on random data. Figure 4(a) shows the time taken to sort an array of 4-byte 
integers (microseconds per item) as a function of logn for the two Quicksorts, Natural 
Mergesort and Splaysort. For large values of n the fastest method was clearly the in-place 
Quicksort, followed by the pointer Quicksort, Natural Mergesort, and the 12-byte Splaysort. 
None of the other adaptive algorithms implemented were competitive with these four, and 
they varied from being 20 per cent slower to more than 100 per cent slower. 


70 70 
60 60 
£ E€ 
® 50 S 50 
T Fa 
à a 
o 40 2 40 
g g 
3 30 o-a 3 30 
79 a D 
© oO © 
S 2-4 O---0 © 20 
S —O— Splaysort s —O— Splaysort 
—#— Natural Mergesort —A— Natural Mergesort 
10 —O— Pointer Quickson 10 —O— Pointer Quicksort 
--40-- Quicksort -- 3-- Quickson 


iog n logn 


(a) (b) 


Figure 4. Time in microseconds per item to sort a random permutation of n items as a function of logn, where 
(a) each item is a 4-byte integer, and n varies from 1024 to 65,536; and (b) each item is an 8-byte floating 
point value, and n varies from 1024 to 16,384 


Given the particular care taken in the Bentley-McIlroy Quicksort to sort 4-byte objects 
efficiently, these results are hardly surprising. Indeed, this experiment can be regarded as 
maximally discriminating against the pointer-based sorting algorithms. Conversely, longer 
records favour the pointer-based methods. Figure 4(b) shows the same experiment, but 
carried out on an array of 8-byte doubles. The overhead of swapping 8-byte items means that 
the pointer-based methods are more competitive, with the relativity between them preserved. 
On doubles, both Splaysort and Natural Mergesort were faster than the Sedgewick—Bentley 
Quicksort!!!? that had been implemented at first. 

Table III shows the observed number of comparisons required per item as a function of 
n. For interest, the cost of the old system qsort is also included. The Bentley—Mcllroy 
Quicksort uses a pseudo-median of nine items as the partition element’? instead of a median 
of three, and, although this requires extra time per call, sharply reduces both the number 
of recursive calls made and the overall average number of comparisons required. Never- 
theless, Natural Mergesort still requires fewer comparisons. These experiments confirm the 
experience of Harris, who suggested that Natural Mergesort should be used as an alterna- 


790 A. MOFFAT, G. EDDY AND O. PETERSSON 


tive to Quicksort if extra memory space can be assumed. Data requiring even more costly 
comparisons, such as strings or records with multi-component keys, would be to the further 
advantage of Natural Mergesort, since it requires fewer comparisons than both Quicksort 
and Splaysort. 


Table III. Comparisons to sort random data (observed) 


Method Comparisons 

Natural Mergesort 1.01 n log, n — 0.43n 
Bentley-Mcllroy Quicksort 1.08n log, n — 0.82n 
SunOS qsort 1.20n log, n — 1.06n 
Splaysort 1.45n log, n — 3.05n 


To further show the advantages of allowing the sorting method to use extra space, the 
time required to sort an array of records was also measured. These results are shown in 
Figure 5. In this experiment each record contained a 4-byte string pointer to a word chosen 
randomly from the system dictionary in /usr/dict/words, together with some number of 
4-byte integers filled with random data and not considered to be part of the key. Taking 
b to be the total size of each record, and varying b from four (that is, no extra data, just 
an array of string pointers) to 100, highlighted the cost of data movement in the sorting 
methods that swap data rather than pointers. Either an in-situ or a pointer-based version of 
the Bentley—MclIlroy Quicksort was the fastest method, depending primarily on the size of 
the records. However, the Splaysort and the Natural Mergesort were both sufficiently fast 
that they outperformed the old system qsort, and on long records, where the overhead of 
the O(n) extra space required is smallest, were only a little slower than a pointer-based 
variant of the new Quicksort. 


PRESORTED SEQUENCES 


Given that Natural Mergesort and Splaysort are acceptably fast even for random sequences, 
the obvious question is how much presortedness is required before they outperform Quick- 
sort. To establish the answer to this question, a set of experiments was undertaken for each 
of the sequence generators already described. It was decided to maximally bias the experi- 
ments in favour of Quicksort by using integer data, as for Figure 4(a). The in-situ Quicksort 
was used as a reference point for speed. 

The four graphs of Figure 6 show the results of these experiments. They plot the exe- 
cution time, again in microseconds per item, required by Quicksort, Splaysort and Natural 
Mergesort, to sort n = 65,536 integers as a function of the logarithm of the value of Runs, 
Inv /n, Rem, and SMS respectively. 

Figure 6(a) shows the extent to which Runs-adaptivity can be exploited. Both Natural 
Mergesort and Splaysort outperform Quicksort when the number of runs is small, with 
Natural Mergesort the better of the two. This behaviour is not unexpected, given that 
Natural Mergesort is explicitly tailored to achieve Runs-optimality. 

Figure 6(b) shows what happens when the input sequence has a small number of inver- 
sions. An Jnv-optimal algorithm consumes O(n log(k/n)) time to process a sequence with 
k inversions, and again Splaysort appears to meet this requirement. 

Figure 6(c) shows the result of running the same programs on sequences generated with 


SPLAYSORT 791 


g a 

$ e 

5p 10 a 

a “4 

B ri 

Ë s0 "a. ae 
9 

2 

= 


40 —oO— Splaysort 
—z-— Natural Mergesort 
—O— Pointer Quicksort 
--4-- Quicksort 


0 20 40 60 80 100 
Bytes per item 


Figure 5. Time in microseconds per item to sort a random permutation of n = 16,384 records with each key an 
alphabetic string, plotted as a function of b, the number of bytes in each record, with b varying from 4 to 100 


a bounded value for Rem. In this case Natural Mergesort requires O(n log k) time on 
sequences with Rem(X) = k (that is, it is adaptive with respect to Rem, but not optimally!°) 
and the curve shows the same shape as the O(n log k) Natural Mergesort line in Figure 6(a). 
In contrast to this, the Splaysort behaviour suggests performance that is O(n +k log k); that 
is, that Splaysort seems to be optimally adaptive with respect to Rem. In this case Splaysort 
outperforms both Quicksort and Natural Mergesort, and does so even when a relatively 
large number of items must be removed to leave a sorted sequence. 

Finally, Figure 6(d) shows the performance of the three algorithms on data that is nearly 
sorted according to SMS; that is, it contains a relatively small number of shuffled sequences. 
Neither Quicksort nor Natural Mergesort are adaptive in any way to this measure, but 
again Splaysort displays behaviour indicative of optimal adaptivity, which, in this case, is 
O(n log k). Although Natural Mergesort outperforms Splaysort in Figure 6(a), it handles all 
of the other three kinds of presortedness relatively poorly. 

In many practical sorting operations there will be some amount of existing pre-order of 
one type or another, and Splaysort outperforms Natural Mergesort for Inv, Rem, and SMS 
nearly sorted sequences, and is faster than Quicksort in all nearly sorted situations. It is 
also worth stressing that the adaptivity of Splaysort is achieved with no prior knowledge 
of the type of presortedness (if any) that is present; the sorting method is always the same, 
for all inputs. In this sense, Splaysort is somewhat of a universal sorting algorithm, since 
in an asymptotic sense it appears to do well when any comparison-based sorting algorithm 
can do well. 

The experiments have assumed throughout that each comparison is expensive, incurring a 


792 A. MOFFAT, G. EDDY AND O. PETERSSON 


60 


50 


E€ E€ 
2 a 2 
a 2 
m a 
2 20 o00D-00 e 
i 
kia 8 
= = 
—O— Splaysort —O— Splaysort 
10 —t— Natural Mergesor —A— Natural Mergesort 
---- Quicksort ---- Quicksort 
0 
0 2 4 6 8 10 #12 «#14 16 
log Runs(X) log (Inv(X)/n) 
(a) (b) 
60 
——O-— Splaysort 
so —t-— Natural Mergesort 
--%3-- Quicksort 

E E 
g g 
a & 
g £ 
te] 
$ F 
2 2 
2 2 
= = 


—O— Spiaysort 
—A— Natural Mergesort 


--1-- Quicksort 
Q 2 4 6 a 0 12 4 18 0 2 4 6 B 1 2 4 16 
log Rem(X) log SMS(X) 


(c) (d) 


Figure 6. Time in microseconds per item to sort random sequences X of n = 65,536 items, plotted as a 
function of log k, with each item a 4-byte integer and k varying from 1 to 65,536 and (a) Runs(X) < k; (b) 
Inv(X) < kn; (c) Rem(X) < k; and (d) SMS (X) < k 


function call. In a situation where the sort is coded for a particular application rather than as a 
general purpose routine, the relative effectiveness of the various methods changes. However, 
cheaper comparisons tend to favour Splaysort against Natural Mergesort, and Quicksort is 
completely non-adaptive. Thus, even in this situation, Splaysort can be considered. 


HOW GOOD? 


It was Cole who first commented on the adaptivity of Splaysort, conjecturing Splaysort to 
be both /nv-optimal and adaptive in a more powerful ‘dynamic finger’ sense,’® in the same 
manner as Mannila’s Local Insertion Sort' uses a finger search tree to obtain improved 


SPLAYSORT 793 


adaptivity. 

eas and Moffat!’ have considered the adaptivity that results from the use of various 
data structures and the insertion sorting paradigm. They introduced three further measures 
Loc, Hist, and Reg. All are defined in terms of d; j, i > j, the number of (already inserted) 
items between zj and 2; at the time of insertion of 2;: 


dij =|{k:1<k <iand min{z;, zj} < zk < max{zj,2;}}|+ 1 


Loosely speaking, the three measures assess the number of items intervening in space be- 
tween temporally adjacent insertions; the number of items intervening in time between 
spatially adjacent insertions; and the minimal two-dimensional distance in space-time be- 
tween the ith item and any previous item: 


di = dii- 
tS min{t :1l<t<iand dii-t = 1} 
Ti = min{t + di i—t —1:1<t< i} 


Loc(X) = ll di 
i=2 


Hist(X) = Ile 
i=2 
Reg(X) = ie 
i=? 


That is, Loc(X) is a measure of the spatial locality in X; Hist(X) is a measure of the 
temporal locality in X; and Reg (X) is a combination of these two, under which a sequence 
is considered to be nearly sorted if most of the insertions are not too far in space from 
some item that was itself inserted not too long ago in time. For all of these three measures 
the pinima value is one for sorted and reverse sorted lists, and the maxumum value is 
O(n). 

Another measure that has been used recently to capture the behaviour of adaptive algo- 
rithms is Block(X), the number of ‘blocks’ of adjacent elements that must be permuted” 
to sort X. The minimum value of Block (X) is 1, and the maximum value n. Table IV lists 
the values of Cyy4(n, k) that have been developed for these four measures. 


Table IV. More measures of presortedness 


Measure Cm(n,k) References 
Loc O(n + log k) 1,19 

Hist O(n + log k) 19 

Reg O(n + log k) 19 


Block O(n+klogk) 23 


Petersson and Moffat also showed that any Reg-optimal algorithm automatically inherits 
optimal adaptivity for all of the measures listed in Tables II and IV, and described a Regional 
Insertion Sort that requires O(n+log Reg (X)) comparisons and O(n log log n+log Reg (X)) 


794 A. MOFFAT, G. EDDY AND O. PETERSSON 


time.!? In terms of comparisons this algorithm matches the bound of Table IV and so is 
comparison-optimal with respect to Reg (and hence all of the other measures discussed), 
but the description of a time-optimal algorithm remains an open problem. 

Let us now return to Splaysort and see how it fits into this framework. Using the definitions 
given by Petersson and Moffat, Cole’s two Splaysort conjectures become: 


Splaysort inversion conjecture: Splaysort sorts n-sequence X in time 
O(n log(Inv(X)/n)). 

Splaysort conjecture: Splaysort sorts n-sequence X in time 
O(n + log Loc(X)). 


Cole et al.’ proved that if an n-sequence X is created by randomly permuting the sorted 
subsequences X; = (¢-1+1,72-14+2,...,i-1+1), where 0 < i < n/l and l = logn, then 
Splaysort sorts X in O(n) time. These ‘log n-block’ sequences form a subset of the set 
of sequences with Block(X) = n/ logn, and the result shows, for this subset at least, that 
Splaysort is fast enough to be Block-optimal (Table IV). 

The Static Finger Theorem proved by Sleator and Tarjan? is strong support for the con- 
jecture of /ny-optimality, since all that is necessary for /nv-optimality is a finger search tree 
where the largest item in the tree is fingered. However, the amortised analysis of the Static 
Finger Theorem introduces an additional n log n term, which, for Jnv-optimality, should be 
n. Furthermore, their Dynamic Finger Conjecture supports the conjecture that Splaysort is 
Loc-optimal, since it is sufficient if the data structure supports fingered insertion in O(log d;) 
time; Cole’s® proof of the Dynamic Finger Conjecture provides the bound of the Splaysort 
Conjecture but with n log log n replacing n, and so at time of writing Splaysort is not proven 
to be either Inv- or Loc-optimal. 

The Working Set Theorem? supports a conjecture that Splaysort is Hist-optimal, since, 
for that degree of adaptivity, it is sufficient if the dictionary data structure can insert each 
item in O(log t;) time, where t; is the number of items inserted since one or other of the 
new item’s neighbours were inserted. There is again, however, an additive nlogn term 
resulting from the analysis given in the proof of that theorem. 

These previous results, our own investigation of data structures supporting both temporal 
and spatial locality,” and the observation that in fact no item in a Splay tree is actually 
fingered — the search is the same irrespective of which item is, for the purposes of the 
analysis, being taken to be the fingered item — lead to the following further conjecture as 
to the adaptivity of Splaysort. 


Generalised Splaysort conjecture: Splaysort sorts n-sequence X in time 
O(n + log Reg (X)). 


That is, we hypothesise that Splaysort is optimally adaptive to all currently accepted mea- 
sures of presortedness.'? As evidence in support of this thesis, experiments have been carried 
out on randomly generated sequences with a bounded amount of Reg-presortedness, taking 
care that the resultant sequences are not nearly sorted by any other measures of presorted- 
ness. Figure 7 shows some Reg-random sequences where Reg(X) < k”, for various values 
of k. Figure 7(b), in which k = 4, describes a sequence considered by Reg to have the 
same amount of presortedness as the sequences shown in Figures 3(a), 3(b), and 3(d). The 


SPLAYSORT 795 


sequence in Figure 7(d) is essentially random, and corresponds to the situation where, for 
example, a sequence of 100 elements is allowed as many as 16 runs. 


Figure 7. Example nearly sorted sequences with n = 100, X [i] as a function of i, where (a) Reg(X) < 2”; (b) 
Reg(X) < 4”; (c) Reg(X) < 8”; and (d) Reg(X) < 16” 


Figure 8 shows the number of comparisons per item required by Splaysort on Reg-random 
sequences with Reg(X) < k” for different combinations of k and n. Since, in Splaysort, the 
running time is directly proportional to the number of comparisons expended, the pattern of 
curves in the graph does not contradict the hypothesised O(n+log Reg(X)) = O(n+logk) 
running time. 


CONCLUSION 


It seems reasonable to assume, in some applications at least, that a sorting algorithm may 
use O(n) additional memory and that some presortedness can be expected. Indeed, if long 
records are being sorted, it is prudent to assume that extra space should be used. The results 
described here show that in such situations Splaysort can be employed with the confidence 
that if there is any pre-existing order in the input sequence it will be exploited, and if there 
is not, the sort will be at worst only a constant factor slower than Quicksort. 

We have also conjectured that Splaysort is optimally adaptive with respect to the measure 
Reg, and, as a consequence of this, that it is optimally adaptive with respect to all currently 
accepted measures of presortedness. Although not necessary for practical use, a proof of 
this conjecture would nevertheless be an important result in its own right. 


SOFTWARE 


Program source code for the Splaysort implementation used here is available through 


http: //www.cs.mu.oz.au/“alistair/splaysort.c 


ACKNOWLEDGEMENTS 


We thank Peter McIlroy (University of California, Berkeley) and Doug McIlroy (AT&T 
Bell Labs) for the assistance they provided during the course of this investigation, and the 


796 


A. MOFFAT, G. EDDY AND O. PETERSSON 


16 k=64 


14 k= 32 


Comparisons per item 
a 


logn 


Figure 8. Comparisons per item for Reg -random sequences, with Reg(X) < k” 


Australian Research Council for their financial support. We also thank the referees for their 
comments and support. 


REFERENCES 


H. Mannila, ‘Measures of presortedness and optimal sorting algorithms’, JEEE Trans. Computers, C-34, 
(4), 318-325 (1985). 

A. Moffat and O. Petersson, ‘An overview of adaptive sorting’, Australian Computer Journal, 24, (2), 
70-77 (1992). 

D. D. Sleator and R. E. Tarjan, ‘Self-adjusting binary search trees’, J. ACM, 32, (3), 652-686 (1985). 

D. E. Knuth, The Art of Computer Programming, Vol. 3: Sorting and Searching, Addison-Wesley, Reading, 
MA, 1973. 

J. Bell and G. Gupta, ‘An evaluation of self-adjusting binary search tree techniques’, Software—Practice 
and Experience, 23, (4), 369-382 (1993). 

K. Mehlhorn, Data Structures and Algorithms, Vol. 1: Sorting and Searching, Springer-Verlag, Berlin, 1984. 
R. Cole, B. Mishra, J. Schmidt and A. Siegel, ‘On the dynamic finger conjecture for splay trees, Part I: 
Splay sorting log n-block sequences’, Technical Report 470, Department of Computer Science, New York 
University, 1989, 

R. Cole, ‘On the dynamic finger conjecture for splay trees’, Proc. 22nd Annual ACM Symposium on Theory 
of Computing, 1990, pp. 8-17. 

D. E. Knuth, The Art of Computer Programming, Vol. 1: Fundamental Algorithms, Addison-Wesley, Read- 
ing, MA, second edition, 1973. 

R. Sedgewick, Algorithms in C, Addison-Wesley, Reading, MA, second edition, 1990. 

R. Sedgewick, ‘Implementing quicksort programs’, Communications of the ACM, 21, 847-857 (1978). 

J. L. Bentley, ‘Programming pearls: How to sort’, Communications of the ACM, 27, 287-291 (1984). 

J. L. Bentley and M.D. Mcllroy, ‘Engineering a sorting function’, Software—Practice and Experience, 23, 
(11), 1249-1265 (1993). 


14. 
15. 
16. 
17. 


18. 


19, 


20. 


21. 


22. 


23. 


24. 


SPLAYSORT 797 


C. R. Cook and D. J. Kim, ‘Best sorting algorithms for nearly sorted lists’, Communications of the ACM, 
23, (11), 620—624 (1980). 

C. Levcopoulos and O. Petersson, ‘Adaptive Heapsort’, J. Algorithms, 14, 395-413 (1993). 

S. S. Skiena, ‘Encroaching lists as a measure of presortedness’, BIT, 28, (4), 775-784 (1988). 

P. Mcllroy, ‘Optimistic sorting and information theoretic complexity’, Proc. ACM-SIAM Symposium on 
Discrete Algorithms, Austin, Texas, January 1993, pp. 467-475. 

A. Moffat, O. Petersson and N.C. Wormald, ‘A tree-based mergesort’, Acta Informatica, To appear. 
Prel. version in Proc. International Symposium on Algorithms and Computation, pp. 499-508, LNCS 650, 
Springer-Verlag, 1992. 

O. Petersson and A. Moffat, ‘A framework for adaptive sorting’, Discrete Applied Mathematics, 59, (2), 
153-179 (1995). 

J. D. Harris, ‘Sorting unsorted and partially sorted lists using the Natural Merge Sort’, Software—Practice 
and Experience, 11, (12), 1339-1340 (1981). 

C. Levcopoulos and O. Petersson, ‘Sorting shuffled monotone sequences’, Information and Computation, 
112, 37-50 (1994). 

A. Brandstadt and D. Kratsch, ‘On partitions of permutations into increasing and decreasing subsequences’, 
J. Information Processing and Cybernetics, 22, (5/6), 263-273 (1986). 

S. Carlsson, C. Levcopoulos and O. Petersson, ‘Sublinear merging and Natural Mergesort’, Algorithmica, 
9, 629-648 (1993). 

A. Moffat and O. Petersson, ‘Historical searching’, International Journal of Foundations of Computer 
Science, 4, (1), 85-98 (1993). 


